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Abstract. In this paper we explore the problem of fitting a 3D mor¬ 
phable model to single face images using only sparse geometric features 
(edges and landmark points). Previous approaches to this problem are 
based on nonlinear optimisation of an edge-derived cost that can be 
viewed as forming soft correspondences between model and image edges. 
We propose a novel approach, that explicitly computes hard correspon¬ 
dences. The resulting objective function is non-convex but we show that 
a good initialisation can be obtained efficiently using alternating linear 
least squares in a manner similar to the iterated closest point algorithm. 
We present experimental results on both synthetic and real images and 
show that our approach outperforms methods that use soft correspon¬ 
dence and other recent methods that rely solely on geometric features. 


1 Introduction 

Estimating 3D face shape from one or more 2D images is a longstanding problem 
in computer vision. It has a wide range of applications from pose-invariant face 
recognition [l| to creation of 3D avatars from 2D images [2|. One of the most 
successful approaches to this problem is to use a statistical model of 3D face 
shape [3]. This transforms the problem of shape estimation to one of model 
fitting and provides a strong statistical prior to constrain the problem. 

The model fitting objective can be formulated in various ways, the most 
obvious being an analysis-by-synthesis approach in which appearance error is 
directly optimised [3]. However, feature-based methods m are in general more 
robust and lead to optimisation problems less prone to convergence on local 
minima. In this paper, we focus on fitting to edge features in images. 

Image edges convey important information about a face. The occluding bound¬ 
ary provides direct information about 3D shape, for example a profile view re¬ 
veals strong information about the shape of the nose. Internal edges, caused by 
texture changes, high curvature or self occlusion, provide information about the 
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position and shape of features such as lips, eyebrows and the nose. This infor¬ 
mation provides a cue for estimating 3D face shape from 2D images or, more 
generally, for fitting face models to images. 

In Section [2] we introduce relevant background. In Section [3] we present a 
method for fitting to landmarks with known model correspondence. Our key 
contribution is in Section [4] where we present a novel, fully automatic algorithm 
for fitting to image edges with hard correspondence. By hard correspondence, 
we mean that an explicit correspondence is computed between projected model 
vertex and edge pixel. For comparison, in Section [5] we describe our variant of 
previous methods I4lfil7| that fit to edges using soft correspondence. By soft cor¬ 
respondence, we mean that an energy term that captures many possible edge 
correspondences is minimised. Finally, we compare the two approaches experi¬ 
mentally and others from the recent literature in Section [6j 

1.1 Related Work 

Landmark fitting 2D landmarks have long been used as a way to initialize 
a morphable model fit [3]. Breuer et al. [8] obtained this initialisation using a 
landmark detector providing a fully automatic system. More recently, landmarks 
have been shown to be sufficient for obtaining useful shape estimates in their 
own right [ 91 . Furthermore, noisily detected landmarks can be filtered using a 
model HD] and automatic landmark detection can be integrated into a fitting 
algorithm nu. In a similar manner to landmarks, local features can be used to 
aid the fitting process [5 . 

Edge fitting An early example of using image edges for face model fitting is the 
Active Shape Model (ASM) [12] where a 2D boundary model is aligned to image 
edges. In 3D, contours have been used directly for 3D face shape estimation [T3] 
and indirectly as a feature for fitting a 3DMM. The earliest work in this direction 
was due to Moghaddam et al. HU who fitted a 3DMM to silhouettes extracted 
from multiple views. From a theoretical standpoint, Liithi et al. m explored to 
what degree face shape is constrained when contours are fixed. 

Romdhani et al. [41 include an edge distance cost as part of a hybrid energy 
function. Texture and outer (silhouette) contours are used in a similar way to 
LM-ICP m where correspondence between image edges and model contours 
is “soft”. This is achieved by applying a distance transform to an edge image. 
This provides a smoothly varying cost surface whose value at a pixel indicates 
the distance (and its gradient, the direction) to the closest edge. This idea was 
extended by Amberg et al. [6] who use it in a multi-view setting and smooth the 
edge distance cost by averaging results with different parameters. In this way, the 
cost surface also encodes the saliency of an edge. Keller et al. [7] showed that such 
approaches lead to a cost function that is neither continuous nor differentiable. 
This suggests the optimisation method must be carefully chosen. 

Edge features have also been used in other ways. Cashman and Fitzgibbon 
m learn a 3DMM from 2D images by fitting to silhouettes. Zhu et al. m 
present a method that can be seen as a hybrid of landmark and edge fitting. 
Landmarks that define boundaries are allowed to slide over the 3D face surface 
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during fitting. A recent alternative to optimisation-based approaches is to learn 
a regressor from extracted face contours to 3DMM shape parameters [19]. 

Fitting a 3DMM to a 2D image using only geometric features (i.e. landmarks 
and edges) is essentially a non-rigid alignment problem. Surprisingly, the idea 
of employing an iterated closest point m approach with hard edge correspon¬ 
dences (in a similar manner to ASM fitting) has been discounted in the literature 
|4l. In this paper, we pursue this idea and develop an iterative 3DMM fitting 
algorithm that is fully automatic, simple and efficient (and we make our im¬ 
plementation availably. Instead of working in a transformed distance-to-edge 
space and treating correspondences as “soft”, we compute an explicit correspon¬ 
dence between model and image edges. This allows us to treat the model edge 
vertices as a landmark with known 2D position, for which optimal pose or shape 
estimates can be easily computed. 

State of the art The most recent face shape estimation methods are able to ob¬ 
tain considerably higher quality results than the purely model-based approaches 
above. They do so by using pixel-wise shading or motion information to apply 
finescale refinement to an initial shape estimate. For example, Suwajanakorn et 
al. [21] use photo collections to build an average model of an individual which is 
then fitted to a video and finescale detail added by optical flow and shape-from- 
shading. Cao et al. [22] take a machine learning approach and train a regressor 
that predicts high resolution shape detail from local appearance. 

Our aim in this paper is not to compete directly with these methods. Rather, 
we seek to understand what quality of face reconstruction it is possible to obtain 
using solely sparse, geometric information. The output of our method may pro¬ 
vide a better initialisation for state of the art refinement techniques or remove 
the need to have a person specific model. 

2 Preliminaries 

Our approach is based on fitting a 3DMM to face images under the assump¬ 
tion of a scaled orthographic projection. Hence, we begin by introducing scaled 
orthographic projection and 3DMMs. 


2.1 Scaled Orthographic Projection 

The scaled orthographic, or weak perspective, projection model assumes that 
variation in depth over the object is small relative to the mean distance from 
camera to object. Under this assumption, the projected 2D position of a 3D point 
v = [u v w] T given by SOP[v, R, t, s] G M 2 does not depend on the distance of 
the point from the camera, but only on a uniform scale s given by the ratio of 
the focal length of the camera and the mean distance from camera to object: 


SOP[v, R, t, s] = s 


1 0 0 
0 1 0 


Rv + st 


1 Matlab implementation: github.com/wapsl01/3DMM_edges 


(i) 
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where the pose parameters R G M 3x3 , t G M 2 and s G M + are a rotation matrix, 
2D translation and scale respectively. 


2.2 3D Morphable Model 

A 3D morphable model is a deformable mesh whose shape is determined by the 
shape parameters a G M* 5 . Shape is described by a linear model learnt from data 
using Principal Components Analysis (PCA). So, the shape of any face can be 
approximated as: 

f(a) = Pa + f, (2) 

where P G M 3iVx ‘ 5 contains the S principal components, f G M 3Ar is the mean 
shape and the vector f(a) G M 3iV contains the coordinates of the N vertices, 
stacked to form a long vector: f = [u\ v\ w\ ... un % wjy\ . Hence, the ith 
vertex is given by: = [Hi-2 fsi-i Hi] • For convenience, we denote the sub¬ 

matrix corresponding to the ith vertex as G M. 3xS and the corresponding 
vertex in the mean face shape as f \ G M 3 , such that the ith vertex is given by: 
Vi = P^a + f*. Similarly, we define the row corresponding to the u component 
of the ith vertex as V iu (similarly for v and w) and define the u component of 
the ith mean shape vertex as H u (similarly for v and w). 


3 Fitting with Known Correspondence 

We begin by showing how to fit a morphable model to L observed 2D positions 
Xi = [xi yi] (i = 1... L) arising from the projection of corresponding vertices 
in the morphable model. We discuss in Section [4] how these correspondences 
are obtained in practice. Without loss of generality, we assume that the ith 2D 
position corresponds to the ith vertex in the morphable model. The objective 
of fitting a morphable model to these observations is to obtain the shape and 
pose parameters that minimise the reprojection error, E\ m ^ between observed 
and predicted 2D positions: 

1 L 

E\ m k(a, R,t,s) = — ||xj - SOP [P,a + f*,R, t, s] || 2 . (3) 

The scale factor in front of the summation makes the magnitude of the er¬ 
ror invariant to the number of landmarks. This problem is multilinear in the 
shape parameters and the SOP transformation matrix. It is also nonlinearly con¬ 
strained, since R must be a valid rotation matrix. Although minimising E\ m ^ 
is a non-convex optimisation problem, a good initialisation can be obtained us¬ 
ing alternating linear least squares and this estimate subsequently refined using 
nonlinear optimisation. This is the approach that we take. 
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3.1 Pose Estimation 


We make an initial estimate of R, t and s using a simple extension of the POS 
algorithm [23] . Compared to POS, we additionally enforce that R is a valid 
rotation matrix. We begin by solving an unconstrained system in a least squares 
sense. We stack two copies of the 3D points in homogeneous coordinates, such 
that A 2 i-i = \ui Vi Wi 1 0 0 0 0] and A 2 i = [0 0 0 0 Ui Vi Wi 1] and form a long 
vector of the corresponding 2D points d= [x\ y\ ••• xl Vl\ T - We then solve 
for k G M 8 in Ak = d using linear least squares. We define iq = [Aq Aq Aq] and 
1*2 = [Aq Aq Aq]. Scale is given by s = (||ri || + ||r 2 1|)/2 and the translation vector 
by t = [Aq/s ks/s] T . We perform singular value decomposition on the matrix 
formed from iq and r 2 : 


USV T 


1*1 

1*2 

iq x r 2 


(4) 


The rotation matrix is given by R = UV T . If det(R) = — 1 then we negate 
the third row of U and recompute R. This guarantees that R is a valid rotation 
matrix. This approach gives a good initial estimate which we subsequently refine 
with nonlinear optimization of E\ m ^ with respect to R, t and s. 


3.2 Shape Estimation 

With a fixed pose estimate, shape parameter estimation under scaled ortho¬ 
graphic projection is a linear problem. The 2D position of the ith vertex as 
a function of the shape parameters is given by: sRi.. 2 (Pia + f*) + st. Hence, 
each observed vertex adds two equations to a linear system. Concretely, for each 
image we form the matrix C G M? LxS where 

C 2 i-i = s(RiiP^ + Ri 2 P^, + R13PL) 

and 

c 2 i = s(R 2 lPfu + R22P^ + R23PL) 

and vector h G M 2L where 


h 2 i-i = x i~ <s(Rifi + ti) and h 2i = yi - s(R 2 f* + t 2 ). 

We solve Ca = h in a least squares sense subject to an additional constraint 
to ensure plausibility of the solution. We follow Brunton et al. [24] and use 
a hyperbox constraint on the shape parameters. This avoids having to choose 
a regularisation weight but ensures that each parameter lies within k standard 
deviations of the mean by introducing a linear inequality constraint on the shape 
parameters (we use k = 3 in our experiments). Hence, the problem can be solved 
in closed form as an inequality constrained linear least squares problem. 
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3.3 Nonlinear Refinement 

Having alternated pose and shape estimation for a fixed number of iterations, 
finally we perform nonlinear optimisation of £j m k over a, R, t and s simulta¬ 
neously. We represent R in axis-angle space to ensure that it remains a valid 
rotation matrix and we retain the hyperbox constraint on a. We minimise E\ m ^ 
using the trust-region-reflective algorithm [25] as implemented in the Matlab 
lsqnonlin function. 


4 Fitting with Hard Edge Correspondence 

The method in Section [3] enables a 3DMM to be fitted to 2D landmark positions 
if the correspondence between landmarks and model vertices is known. Edges, 
for example caused by occluding boundaries, do not have a fixed correspondence 
to model vertices. Hence, fitting to edges requires shape and pose estimation 
to happen in conjunction with establishing correspondence between image and 
model edges. Our proposed approach establishes these correspondences explicitly 
by finding the closest image edge to each model boundary vertex (subject to 
additional filtering to remove unreliable matches). Our method comprises the 
following steps: 

1. Detect facial landmarks 

2. Initialise shape and pose estimates by fitting to landmarks only 

3. Improve initialisation using iterated closest edge fitting 

4. Nonlinear optimisation of hybrid objective function containing landmark, 
edge and prior terms 

We describe each of these steps in more detail in the rest of this section. 

4.1 Landmarks 

We use landmarks both for initialisation and as part of our overall objective 
function as one cue for shape estimation. We apply a facial landmark detector 
that is suitable for operating on “in the wild” images. This provides approximate 
positions of facial landmarks for which we know the corresponding vertices in the 
morphable model. We use these landmark positions to make an initial estimate 
of the pose and shape parameters by running the method in Section [3] with only 
these corresponding landmark positions. Note that any facial landmark detector 
can be used at this stage. In our experiments, we show results with a recent 
landmark detection algorithm m that achieves state-of-the-art performance 
and for which code is provided by the authors. In our experimental evaluation, 
we include the results of fitting to landmarks only. 
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4.2 Edge Cost 

We assume that a subset of pixels have been labelled as edges and stored as the 
set £ = {(x, y) |(x, y) is an edge}. In practice, we compute edges by applying the 
Canny edge detector with a fixed threshold to the input image. 

Model contours are computed based on the pose and shape parameters as 
the occluding boundary of the 3D face. The set of occluding boundary vertices, 
B(a,R,t,s), are defined as those lying on a mesh edge whose adjacent faces 
have a change of visibility. This definition encompasses both outer (silhouette) 
and inner (self-occluding) contours. Since the viewing direction is aligned with 
the z- axis, this is tested simply by checking if the sign of the ^-component of the 
triangle normal changes on either side of the edge. In addition, we check that 
potential edge vertices are not occluded by another part of the mesh (using z- 
buffering) and we ignore edges that lie on a mesh boundary since they introduce 
artificial edges. In this paper, we deal only with occluding contours (both inner 
and outer). If texture contours were defined on the surface of the morphable 
model, it would be straightforward to include these in our approach. 

We define the objective function for edge fitting with hard correspondence as 
the sum of squared distances between each projected occluding boundary vertex 
and the closest edge pixel: 


-S'edge( < ^5 R? C s) — 

———min llbvl^ 


(5) 


SOP [Pia + fi.R.M] || 2 . 


Note that the minimum operator is responsible for computing the hard corre¬ 
spondences. This objective is non-convex since the minimum of a set of convex 
functions is not convex m, Hence, we require a good initialisation to ensure con¬ 
vergence to a minimum close to the global optimum. Fitting to landmarks only 
does not provide a sufficiently good initialisation. So, in the next subsection we 
describe a method for obtaining a good initial fit to edges, before incorporating 
the edge cost into a hybrid objective function in Section [475) 


4.3 Iterated Closest Edge Fitting 

We propose to refine the landmark-only fit with an initial fit to edges that 
works in an iterated closest point manner. That is, for each projected model 
contour vertex, we find the closest image edge pixel and we treat this as a known 
correspondence. In conjunction with the landmark correspondences, we again run 
the method in Section [3] This leads to updated pose and shape parameters, and 
in turn to updated model edges and correspondences. We iterate this process for 
a fixed number of iterations. We refer to this process as Iterated Closest Edge 
Fitting (ICEF) and provide an illustration in Figure [l] On the left we show an 
input image with the initial landmark detection result. In the middle we show 
the initial shape and pose obtained by fitting only to landmarks. On the right 
we show image edge pixels in blue and projected model contours in green (where 
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Fig. 1. Iterated closest edge fitting for initialisation of the edge fitting process. 
Left: input image with automatically detected landmarks. Middle: overlaid shape 
obtained by fitting only to landmark. Right: image edges in blue, model boundary 
vertices with image correspondences in green, unreliable correspondences in red. 


nearest neighbour edge correspondence is considered reliable) and in red (where 
correspondence is considered unreliable). The green/blue correspondences are 
used for the next iteration of fitting. 

Finding the image edge pixel closest to a projected contour vertex can be done 
efficiently by storing the image edge pixels in a M-tree. We filter the resulting 
correspondences using two commonly used heuristics. First, we remove 5% of 
the matches for which the distance to the closest image edge pixel is largest. 
Second, we remove matches for which the image distance divided by s exceeds 
a threshold (chosen as 10 in our experiments). The division by scale factor s 
makes this choice invariant to changes in image resolution. 


4.4 Prior 


Under the assumption that the training data of the 3DMM forms a Gaussian 
cloud in high dimensional space, then we expect that each of the shape param¬ 
eters follows a normal distribution with zero mean and variance given by the 
eigenvalue, Aassociated with the corresponding principal component. We find 
that including a prior term that captures this assumption significantly improves 
performance over using the hyperbox constraint alone. The prior penalises de¬ 
viation from the mean shape as follows: 



(6) 
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Fig. 2 . Edge cost surface with soft correspondence (right) computed from input 
image (left) 

4.5 Nonlinear Refinement 

Finally, we perform nonlinear optimisation of a hybrid objective function com¬ 
prising landmark, edge and prior terms: 

E/(o:, R, t, s) = (og R, t, s) T ^2-E^edg e(^A h,, ^) T ^ 3 -^prior (^)5 (7) 

where uq, w 2 and ^3 weight the contribution of each term to the overall energy. 
The landmark and edge terms are invariant to the number of landmarks and 
edge vertices which means we do not have to tune the weights for each image (for 
example, for the results in Table [l] we use fixed values of: w\ = 0.15, uq = 0.45 
and W 3 = 0.4). We retain the hyperbox constraint and so the hybrid objective is 
a constrained nonlinear least squares problem and we again optimise using the 
trust-region-reflective algorithm. 

For efficiency and to avoid problems of continuity and differentiability of the 
edge cost function, we follow [6] and keep occluding boundary vertices, S, fixed 
for a number of iterations of the optimiser. After a number of iterations, we re¬ 
compute the vertices lying on the occluding boundary and restart the optimiser. 

5 Fitting with Soft Edge Correspondence 

We compare our approach with a method based on optimising an edge cost 
function, in the same spirit as previous work Emu- We follow the same approach 
as Amberg et al. j6] to compute the edge cost function, however we further 
improve robustness by also integrating over scale. For our edge detector, we use 
gradient magnitude thresholding with non-maxima suppression. Given a set of 
edge detector sensitivity thresholds T and scales <S, we compute n = |T x S\ 
edge images, F 1 , ..., F n , using each pair of image scale and threshold values. 
We compute the Euclidean distance transform, D 1 , ..., D n , for each edge image 
(i.e. the value of each pixel in D l is the distance to the closest edge pixel in E l ). 
Finally, we compute the edge cost surface as: 



n 


(8) 
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Fig. 3. Synthetic input images for one subject 


Method 

Rotation angle 

Mean 

1 

-<r 

o 

o 

-50° 

1 

CO 

o 

o 

-15° 

0° 

15° 

30° 

o 

O 

iO 

o 

O 

r- 

Average face 

3.35 

3.35 

3.35 

3.35 

3.35 

3.35 

3.35 

3.35 

3.35 

3.35 

Proposed (landmarks only) 

2.67 

2.60 

2.58 

2.64 

2.56 

2.49 

2.50 

2.54 

2.63 

2.58 

Aldrian and Smith [9] 

2.64 

2.60 

2.55 

2.54 

2.49 

2.42 

2.43 

2.44 

2.54 

2.52 

Romdhani et al. [4] (soft) 

2.65 

2.59 

2.58 

2.61 

2.59 

2.50 

2.50 

2.46 

2.51 

2.55 

Proposed (ICEF) 

2.38 

2.40 

2.51 

2.38 

2.52 

2.45 

2.43 

2.38 

2.3 

2.42 

Proposed (hard) 

2.35 

2.26 

2.38 

2.40 

2.51 

2.39 

2.40 

2.20 

2.26 

2.35 


Table 1. Mean Euclidean vertex distance (mm) with ground truth landmarks 


The parameter k determines the influence range of an edge in an adaptive man¬ 
ner. Amberg et al. [6] suggest a value for k of l/20th the expected size of the 
head in pixels. We compute this parameter automatically from the scale s. An 
example of an edge cost surface is shown in Figure [2j To evaluate the edge cost, 
we compute model contour vertices as in Section[T2j project them into the image 
and interpolate the edge cost function using bilinear interpolation: 


^softedge (u, ^5 F 


1 

|fi(a,R,t,s)| 


Y S( SOP [P iOL + fi, R, t, s]). 

2G/3(o:,R,t,s) 


(9) 

As with the hard edge cost, we found that the best performance was achieved 
by also including the landmark and prior terms in a hybrid objective function. 
Hence, we minimise: 


E/(o:, R, t, s) — W\ Ei m k(o:, R, t, s) ~\~ '^2-E'softedg e(u, R, "tj <s) + W3Ep r [ or (QL^ . (10) 

We again initialise by fitting to landmarks only using the method in Section 
|4.1[ retain the hyperbox constraint and optimise using the trust-region-reflective 
algorithm. We use the same weights as for the hard correspondence method in 
our experiments. 


6 Experimental Results 

We present two sets of experimental results. First, we use synthetic images with 
known ground truth 3D shape in order to quantitatively evaluate our method 
and provide comparison to previous work. Second, we use real images to provide 
qualitative evidence of the performance of our method in uncontrolled conditions. 
For the 3DMM in both sets of experiments we use the Basel Face Model [28]. 
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Method 

Landmark noise std. dev. 

a — 0 

( 7=1 

C 7 = 2 

a — 3 

(7 = 4 

(7 = 5 

Proposed (landmarks only) 

2.58 

2.60 

2.61 

2.68 

2.76 

2.85 

Aldrian and Smith [9] 

2.52 

2.53 

2.55 

2.62 

2.65 

2.73 

Romdhani et al. [4] (soft) 

2.55 

2.57 

2.57 

2.62 

2.70 

2.76 

Proposed (ICEF) 

2.42 

2.43 

2.43 

2.50 

2.57 

2.60 

Proposed (hard) 

2.35 

2.36 

2.35 

2.39 

2.47 

2.50 


Table 2. Mean Euclidean vertex distance (mm) with noisy landmarks 


Method 

Rotation angle 

Mean 

1 

o 

o 

-50° 

1 

CO 

o 

o 

-15° 

0° 

15° 

CO 

o 

o 

o 

O 

iO 

o 

O 

r- 

Proposed (landmarks only) 

6.79 

6.84 

5.19 

5.74 

5.68 

6.34 

6.48 

7.04 

7.74 

6.43 

Zhu et al. Jit 

N/A 

N/A 

4.63 

5.09 

4.19 

5.22 

4.92 

N/A 

N/A 

N/A 

Romdhani et al. (4] (soft) 

4.46 

3.42 

3.66 

3.78 

3.77 

3.57 

4.31 

4.19 

4.73 

3.99 

Proposed (ICEF) 

3.70 

3.32 

3.26 

3.23 

3.37 

3.50 

3.43 

4.07 

3.52 

3.49 

Proposed (hard) 

3.43 

3.20 

3.19 

3.09 

3.30 

3.36 

3.36 

3.84 

3.41 

3.35 


Table 3. Mean Euclidean vertex distance (mm) with automatically detected 
landmarks 


6.1 Quantitative Evaluation 

We begin with a quantitative comparative evaluation on synthetic data. We 
use the 10 out-of-sample faces supplied with the Basel Face Model and render 
orthographic images of each face in 9 poses (rotations of 0°, ±15°, ±30°, ±50° 
and ±70° about the vertical axis). We show sample input images for one subject 
in Figure [3] In all experiments, we report the mean Euclidean distance between 
ground truth and estimated face surface in mm after Procrustes alignment. 

In the first experiment, we use ground truth landmarks. Specifically, we use 
the 70 Farkas landmarks, project the visible subset to the image (yielding be¬ 
tween 37 and 65 landmarks per image) and round to the nearest pixel. In Table 
[I] we show results averaged over pose angle and over the whole dataset. As a 
baseline, we show the error if we simply use the average face shape. We then 
show the result of fitting only to landmarks, i.e. the method in Section [3] We 
include two comparison methods. The approach of Aldrian and Smith [9] uses 
only landmarks but with an affine camera model and a learnt model of land¬ 
mark variance. The soft edge correspondence method of Romdhani et al. [41 is 
described in Section [5] The final two rows show two variants of our proposed 
methods: the fast Iterated Closest Edge Fitting version and the full version with 
nonlinear optimisation of the hard correspondence cost. Average performance 
over the whole dataset is best for our method and, in general, using edges over 
landmarks only and applying nonlinear optimisation improves performance. The 
performance improvement of our methods over landmark-only methods improves 
with pose angle. This suggest that edge information becomes more salient for 
non-frontal poses. 
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Fig. 4. Qualitative frontalisation results 


The second experiment is identical to the first except that we add Gaussian 
noise of varying standard deviation to the ground truth landmark positions. In 
Table [2] we show results averaged over all poses and subjects. 

In the final experiment we use landmarks that are automatically detected 
using the method of Zhu and Ramanan [26]. This enables us to include compar¬ 
ison with the recent fitting algorithm of Zhu et al. [18;. We use the author’s own 
implementation which only works with a fixed set of 68 landmarks. This means 
that the method cannot be applied to the more extreme pose angles where fewer 
landmarks are detected. In this more challenging scenario, our method again 
gives the best overall performance and is superior for all pose angles. 
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Fig. 5. Qualitative pose editing results 


6.2 Qualitative Evaluation 

In Figure [I] we show qualitative examples from the CMU PIE [29] dataset. Here, 
we fit to images (first row) in a non-frontal pose using automatically detected 
landmarks m and show the reconstruction in the second row. We texture map 
the image onto the mesh, rotate to frontal pose (bottom row) and compare to 
an actual frontal view (third row). Finally, we show qualitative examples from 
the Labelled Faces in the Wild dataset m in Figure [ 5 ] Again, we texture map 
the image to the mesh and show a range of poses. These results show that our 
method is capable of robustly and fully automatically fitting to unconstrained 
images. 

7 Conclusions 

We have presented a fully automatic algorithm for fitting a 3DMM to single im¬ 
ages using hard edge correspondence and compared it to existing methods using 
soft correspondence. In 3D-3D alignment, the soft correspondence of LM-ICP 
Pi is demonstrably more robust than hard ICP [233]. However, in the context 
of 3D-2D nonrigid alignment, a soft edge cost function is neither continuous nor 
differentiable since contours appear, disappear, split and merge under param¬ 
eter changes |7|. This makes its optimisation challenging, unstable and highly 
dependent on careful choice of optimisation parameters. Although our proposed 
algorithm relies on potentially brittle hard correspondences, solving for shape 
and pose separately requires only solution of a linear problem and, together, 
optimisation of a multilinear problem. This makes iterated closest edge fitting 
very fast and it provides an initialisation that allows the subsequent nonlinear 
optimisation to converge to a better optimum. We believe that this explains the 
improved performance over edge fitting with soft correspondence. 

There are many ways this work can be extended. First, we could explore other 
ways in which the notion of soft correspondence is formulated. For example, 
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we could borrow from SoftPOSIT [3Tj or Blind PnP [32] which both estimate 
pose with unknown 3D-2D correspondence. Second, we could incorporate any 
of the refinements to standard ICP [33]. Third, we currently use only geometric 
information and do not fit texture. Finally, we would like to extend the method 
to video using a model that captures expression variation and incorporating 
temporal smoothness constraints. 
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